Model Selection for Mixtures of Factor Analyzers via Hierarchical BIC
نویسندگان
چکیده
Bayesian information criterion (BIC) is a common model selection criterion for mixtures of factor analyzers (MFA). However, it is found that BIC penalizes each factor analyzer implausibly using the whole sample size. In this paper, we propose a new criterion for MFA called hierarchical BIC (H-BIC). Formally, the main difference from BIC is that H-BIC penalizes each factor analyzer using its own effective sample size only. Theoretically, we show that HBIC is a large sample approximation of variational Bayesian (VB) lower bound and BIC is a further approximation of HBIC. Additionally, to perform H-BIC efficiently, we propose a novel algorithm in which we does not use H-BIC as a criterion to choose one from a set of candidate models with different latent dimensions, rather, we integrates the determination of latent dimensions into parameter estimation for a given number of components. Consequently, this algorithm only requires choosing one from the much smaller set with different number of components. Experiments on a number of synthetic and real data sets reveal that (i) H-BIC is more accurate than BIC and several existing competing methods; (ii) the proposed novel algorithm is much more efficient than that usually used for BIC.
منابع مشابه
Extending mixtures of multivariate t-factor analyzers
Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include co...
متن کاملAdaptive Mixtures of Factor Analyzers
A mixture of factor analyzers is a semi-parametric density estimator that generalizes the well-known mixtures of Gaussians model by allowing each Gaussian in the mixture to be represented in a different lower-dimensional manifold. This paper presents a robust and parsimonious model selection algorithm for training a mixture of factor analyzers, carrying out simultaneous clustering and locally l...
متن کاملClustering via the Bayesian information criterion with applications in speech recognition
One difficult problem we are often faced with in clustering analysis is how to choose the number of clusters. In this paper, we propcse to choose the number of clusters by optimizing the Bq2yesian information criterion (BIC), a model selection criierion in the statistics literature. We develop a termination criterion for the hierarchical clustering methods which optimizes the BIC criterion in a...
متن کاملMixtures of common t-factor analyzers for clustering high-dimensional microarray data
MOTIVATION Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the speci...
متن کاملMixtures of skew-t factor analyzers
In this paper, we introduce a mixture of skew-t factor analyzers as well as a family of mixture models based thereon. The mixture of skew-t distributions model that we use arises as a limiting case of the mixture of generalized hyperbolic distributions. Like their Gaussian and t-distribution analogues, our mixture of skew-t factor analyzers are very well-suited to the model-based clustering of ...
متن کامل